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A trellis code is a "sliding window" method of encoding a binary data stream 
into a sequence of real numbers that are input to a noisy transmission channel. 
When a trellis code is used to encode data at the rate of" k bits/channel symbol, 
each channel input will depend not only on the most recent block of k data 
bits to enter the encoder but will also depend on, say, the v bits preceding this 
block. The v bits determine the state of the encoder and the most recent block 
of k bits generates the channel symbol conditional on the encoder state. The 
performance of trellis codes, like that of block codes, depends on a suitably 
defined minimum-distance property of the code. In this paper we obtain upper 
bounds on this minimum distance that are simple functions of k and v. These 
results also provide a lower bound on the number of states required to achieve 
a specific coding gain. 

I. INTRODUCTION 

In this paper we are concerned with transmission of digital data 
using trellis codes to gain some noise immunity over standard uncoded 
methods. We assume pulse amplitude modulation whereby the values 
of the transmitted data are estimated from a sequence of samples r 7 
generated by a receiver. These output samples are often modeled as 

r J = x j + n j , (1) 

where x j is a real number sequence determined by the source sequence 
of binary data and n J is an independent zero-mean white Gaussian 
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noise sequence of variance a 2 . For uncoded transmission at rate k bits/ 
symbol, x j takes on one of 2* fixed values. Error performance may be 
improved using coding, but if we insist on transmitting at rate k bits/ 
symbol then we must increase the number of possible values taken by 
the x j . We can choose either a block or tree (trellis) structure for the 
code. In this paper we consider only trellis codes. The performance of 
trellis codes, like that of block codes, depends on a suitably defined 
minimum-distance property of the code. We obtain upper bounds on 
this minimum distance, d mi „. The analogous problem for block codes 
is well studied, but little work has been done on distance properties of 
trellis codes. 1,2 

We assume the following model for encoding the binary data (i.e., 
choosing the x j ) prior to transmission over the Gaussian channel. 
Regard the incoming binary digits as partitioned into blocks of k 
consecutive bits. The real number x j is to be a time-independent 
function of the most recent fe-bit block and also of the v bits preceding 
this block. Thus if {a,} is the binary data sequence, we assume 

x j = x(a jk , cijk-i, • • • , ctjk-(k-iti ci(j-i)k, • • • , aij-nk-o-i))- (2) 

This is an example of a fe-bit/symbol trellis code. We regard the v 
"old" bits as determining the state of the encoder (there are 2" possible 
states) and the k "new" bits as generating the channel symbol (there 
are 2 k possible symbols) conditional on the encoder state. The trellis 
structure is made evident by drawing an example. Fig. 1 shows the 
case k = 1, v — 2. 

If, in this example, the encoder is in state (00) at time ;', and the 
next bit (block of k = 1 bits) to be transmitted is a 1, then we transmit 
the symbol x(100) and move to state (10). 

Other trellis codes exist. For example, we could define a code with 
just three trellis states or the symbols x j could also depend on the time 
index ;'. However, we shall only consider trellis codes determined by 
(2). The trellis structure of (2) is identical to that of linear algebraic 
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Fig. 1 — Diagram of a trellis code. 
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convolutional codes. We use the term sliding window trellis codes for 
trellis codes determined by (2). 

To simplify the discussion in the text we shall assume that k divides 
v. The general case is treated in Appendix B. 

The problem we consider involves certain distance properties of 
trellis codes. To motivate it, consider the decoding problem. Optimum 
decoding involves finding the most likely path through the trellis, 
given the observed sequence (l). 3 Typically, the path chosen will not 
coincide with the correct path for all time but will occasionally diverge 
from it and remerge at a later time. This is called an error event, and 
we generically denote it by the letter E. For example, with the trellis 
in Fig. 1, x(000) may have been sent several times in succession, 
resulting in the straight path shown in Fig. 2, but noise may have 
caused the decoder to choose an alternate path. In Fig. 2 the decoder 
chose the symbols x(100), x(010), x(001) instead of x(000), x(000), 
x(000). 

An error event E of length L lasts from time i to time i + L, the 
decoder having decided upon the symbol sequence x ,+1 , • • • , x t+L 
instead of the correct sequence x i+1 , • • • , x i+L . The (squared) Euclidean 
distance d 2 (= d 2 (E)) between the two paths of E is given by 

i+L 

d 2 = % (x j -x j ) 2 (3) 

j=i+i 

and is crucial to determining the probability P{E) of an error event 
E. With the white noise assumption made in (1), P(E) is easy to 
calculate and, when d 2 :» a 2 , it is approximately given by 

«B)««p(-£). (4) 

Equation (4) leads us to expect that, for small noise, symbol error 
probabilities will be determined by error events having the smallest 
minimum distance between their two paths and it becomes of interest 
to design codes that have good minimum-distance properties in this 
sense. Such designs have recently been considered by Ungerboeck, 
who obtained on the order of 3-dB performance improvements (factor 
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Fig. 2 — Example of an error event. 
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of 2 in minimum distance) over the uncoded case for k — 1, 2, and four 
or eight states in the trellis. 4 

Ungerboeck based his designs on a computer search of binary 
convolutional codes with 2" states, rate k/(k + 1), and a particular 
mapping of the output binary (k + 1) tuples to 2 k+1 equally spaced 
channel symbols (±1, ±3, etc.). His use of convolutional codes thus 
conforms to the general scheme of (2), which implies the same trellis 
structure as described herein. However, his a priori choice of only 2* +1 
equally spaced channel symbols is certainly restrictive in principle. In 
this paper we consider the natural question of how large dm in /P can 
be made if these restrictions are removed. Here, </„,„, is the minimum 
distance between all pairs of paths associated with error events in the 
trellis, and P is the average transmitted power. 

Section II gives a detailed description of the trellis structure and of 
error events. If S is a finite set of error events, then 

min \d\E)\ ^E d\E), (5) 

Ees | o I egs 

since the minimum of a set of real numbers is bounded above by their 
average. This observation is the basis of our first two bounds. The 
first and simplest bound is 

which is obtained in Section III. A more detailed analysis in Section 
IV gives 

dLn 2 k+1 / . V 



2* - 1 V k 



1 + f. (7) 



which is stronger than (6) provided k > 1. Let T be another finite set 
of error events and let r u r 2 2» be real numbers satisfying r x + r 2 = 
1. Then, 

min [d\E)\ *£ r x (-^ I dHE)) + r 2 (j^r I d 2 (E)\ (8) 

EeSUT M^l EBS I \\ L \ EeT I 



since the minimum of a set of real numbers is bounded above by any 
weighted average of those numbers. In Section V, by choosing S, T, 
r u and r 2 , appropriately, we prove 



,2*+l , 



2 + f. (9) 



P " \2 2 * - 1/ \" ' kj 
This bound is stronger than (7) provided v > k(2 k - 1). Combining (7) 
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and (9) we have 



^ mm 






hW£i(>+® "•> 



Extensions of bounds (6), (7), and (9) to the case when k does not 
divide v are given in Appendix B. 

II. A CROUP ACTION ON THE TRELLIS 

In later sections we obtain upper bounds on d^JP by considering 
sets of error events that are fixed by a group of symmetries of the 
trellis. In this section we describe the group. 

We consider trellis codes with 2" states transmitting k bits/channel 
symbol and for simplicity we assume that k divides v. States are 
labelled with binary v tuples, and edges of the trellis are labelled with 
binary v + k tuples. We identify the binary r tuple (b , • • , 6 r _i) with 
the integer 

b 2° + M 1 + • ■ • + hr~i2r\ 

The states are labelled with binary v tuples 00 • • • 0, 10 • • • 0, 
010 • •, • 0, 110 • • • 0, • • • , 11 • • • 1, in increasing order, from top to 
bottom as in Fig. 1. The edges are labelled with binary v + k tuples 
x = x(0 ■■■ 0), xi = x(10 • • • 0), x 2 = x(010 • • • 0), x 3 = x(110 • • • 0), 
" • f *2 r+ *-i = x(ll • • • 1), also in increasing order, from top to bottom 
as in Fig. 1. Set N = (k + v)/k. If we write an edge label as x(so, • • • , 
Sn-i), then it will be understood that each sj is a binary k tuple. A "+" 
appearing in the argument of a label means bit-by-bit modulo 2 
addition. A similar notation will be used for states. 

We define a group of symmetries of the trellis. These symmetries 
will map error events of length L to error events of length L. For each 
binary v + k tuple t, we define a permutation g t of the edge labels x(s) 
by the rule 

g t (x(s)) = x(s + t). (11) 

For example, when k = 1, v = 2, and t = (010), 
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(12) 
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This may also be written 

goio(x) = Tx, 
where T is the permutation matrix 

10 

1 

10 

r= 10 



(13) 



(14) 



10 
1 
10 
10 

If G*,„ = \g t 1 1 is a binary v + k tuple}, then G*,, is an abelian group of 

order 2* + ", and every element g t of G*,, satisfies g 2 t = e, where e is the 

group identity. 

Lemma 1: Any pair of edge labels is interchanged by a unique group 

element. 

Proof: Edge labels x(s) and x(u) are interchanged only by g„+ u . D 

We call the time sections (0, 1), (1, 2), • • ■ the components of the 
trellis. We shall now show how to choose binary v + k tuples t = t°, t\ 
• so that if g t i is applied to the edges in component i, then an error 
event of length L is always mapped to another error event of length 
L. It is, in general, necessary to choose a different g t for each compo- 
nent since if we simply apply the same permutation g t to the edges in 
every component, then an error event E need not be transformed to 
another error event. Thus, if goio is applied to each component of the 
error event shown in Fig. 2, then we obtain the edges shown in Fig. 3. 
The permutation g io transforms the edge labelled x(uvw), joining state 
uw and state uu, into the edge labelled x(u(l + v)w), joining state 
(1 + v)w and state u(l + v). If t = t° = 010, then g io permutes the 
encoder states at time by the rule 



vw t-> (1 + u)w, 
and permutes the encoder states at time 1 by the rule 

uv »-> u(l + u). 



(15) 



(16) 




Fig. 3— Permutation goio applied to all edges of an error event. 
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Similarly, the permutation g t \ permutes encoder states at time 1 and 
encoder states at time 2. If we want to map error events to error 
events, then the action of g t i on encoder states at time 1 must be given 
by (16). Choose t 1 = 001, t 2 = 100, t 3 = 010, t 4 = 001, • • • . The action 
of gt°, gt l , and g t 2 on components 0, 1, and 2 is shown in Fig. 4. Thus 
the sequence (g t o, g t i, g t 2, ■ • •) transforms the error event shown in 
Fig. 2 to the error event shown in Fig. 5. 

For general k and u,\et t = t° = (t , • • • , t^-i) where N = (k + v)/k 
and t , • • • , ts-i are binary k tuples. Let t 1 = (fo-i, to, • • • , ts-z) be 
the vector obtained from t° by cycling the blocks of k bits to the right 
and moving the last block, ts-\, to the front. Repeat this operation i 
times to obtain t l = (t N - if • • • , t N -i, t , • • • , fo-i-i). For i > iV we view 
i as an integer modulo N. Thus t N = t° = t, t N+1 — t\ • • • . The action 
of g t i on encoder states at time i coincides with that of ftw being given 
by the rule 

s »-> (to-i+ii • • • , tiv-i, t , • ■ • t N -i-i) + s. (17) 

If G*,„ = {(fto, fti, • • ■ ) 1 1° is a binary k tuple}, then G£„ is a group of 
2 v+k symmetries of the trellis. The group G&„ is abelian, and every 
element has order 2. We denote (g t o, g t i, •••) by g% since it is 
determined by t°. 

Lemma 2: If i > and if x(s), x(t) are any pair of edge labels in 
component i, then there is a unique element of G*,„ that interchanges 
x(s) and x(t). 

Proof: This follows from Lemma 1, since the restriction of G&„ to the 
edges in component i is just Gk, v . □ 

A set S of error events is said to be fixed by G*,„ if for all g E G*,„ 
and all E G S we have g(E) £ S. 
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Fig. 4— Action of g,», g,\, and g t i. 
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Fig. 5— The symmetry (g,», go, g,i, • • • ) applied to an error event. 
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Lemma 3: Leti^O and let S be a set of error events of the same length 
that is fixed by G&„. // mi(x(a)) is the total number of times the edge 
label x(a) occurs in component i of the error events of S, then 



mMa)) = 



2|S| 

nk+v 



for all v + k tuples a. 



Proof: Let s, t be binary v + k tuples. By Lemma 2 there is an element 
of G*,„ interchanging error events involving x(s) in component i with 
error events involving x(t) in component i. Hence mi(x(s)) = m, (*(£))• 
Since the total number of edges in component i is 2|5|, we have 
mMa)) = 2 1 S \/2 k+ > for all v + k tuples a. 
An orbit S of the group G£„ is a set of error events satisfying 

1. if E £ S and g G G£„ then g(E) G S, and 

2. if E u E 2 G S then there exists g G G£„ such that g(Ex) = E 2 . 
Fig. 6 shows an orbit of Gf 2 - Observe that mMa)) = 1 for all i and 
for all a. 

III. THE FIRST BOUND 

In this section we derive the upper bound 



d 2 



l + i 



This bound will be strengthened in later sections but it seems worth 
presenting the simpler argument here. 

Observe that the average transmitted signal power is simply the 
average of the transmitted channel symbols, namely 



2 r+*_! 



P = ^ k I *l 



(18) 
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Fig. 6— An orbit of Gf^. 
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(Recall that the channel symbol x(ao • • • a,+*-i) is also denoted x, 
where i — ao + (h.2 1 + • • • + a rt .k-i2' +k ~ 1 .) The Euclidean distance 
between the paths of the error event E shown in Fig. 2 is 

d 2 (E) = (xb - *i) 2 + (xo - x 3 ) 2 + (xo - x 6 ) 2 , 
which is a quadratic form in the variables x,. In general we define 

x/ = (xo, Xi, • • • , x 2 -+*-i), (19) 

where the superscript T denotes matrix transpose. Then the Euclidean 
distance d 2 (E) between the paths of an error event E is given by 

d 2 (E) = x T A(E)x, (20) 

where A(E) is a symmetric, positive semi-definite matrix which we 
call the distance matrix of E. The distance matrix A(E) has two 
properties that we wish to note: 

Property I. The ith diagonal element of A(E) counts the number 

of times the symbol x, occurs in the error event. 
Property II. The rows of A(E) sum to zero. 
By (18) and (20), 



dLn . X T A(E)x ov+k . x T A(E )x 
—=— = mm = 2 mm 7f— 

P E P E X T X 



= min ' = 2" + * min - — J-^, (21) 



where we minimize over all error events E. 

Although we will make no use of the fact in this work, we note that 
in (21) only a finite number of error events need be considered, for no 
error event need be considered that has a repeated pair of states. Thus, 
if the pair of states u and w occur at time i and also at a later time j, 
all components between i and ; may be eliminated and the remainder 
of the error event after time / may be placed after time 1. Since 
components cannot make a negative contribution to d 2 (E) the new 
error event has distance no greater than the original one. By (21) the 
best normalized minimum distance that can be achieved for any choice 
of channel symbols is 

r +k max min ^j?** (22) 

X E X'X 

Consider an error event E with initial state (time t = 0) a = 
(ai, • • • , as-i) and final state z = {z\, • • • , Zn-x)- If k tuples 61, b* 
are input at time 0, then at time 1 the two paths occupy states 
(61, ai, • • • , aN-2) and (ft*, Ci, • • • , aN-2). There must be at least N — 
1 further inputs before the paths can remerge. To remerge at 2, the k 
tuples zn-\, zn-2, • • • , 21 must be input in that order to both paths. 
We denote this error event by E(a, z;b u bf). Thus, the minimal length 
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of an error event is N = (k + v)/k. Fig. 2 shows the error event 
£(00, 00; 0, 1) which has minimal length 3. 
Given an arbitrary set S of error events, define 

Q(S)=-r^7 I A(E). (23) 

Let S N be the set of all error events of length N. Note that S N is fixed 

by the group Gjf,,,. 

Theorem l.Ifk divides v then the normalized minimum distance of any 

sliding window trellis code with 2" states and rate k bits/channel symbol 

satisfies 



"ir 



Proof: By (22), 



— — s£ 2 V+ * max mm ~ 

P t e x'x 

+ , . x T A(E)x 
^ 2 max min ^- 



i 



Ees N xx 



. x 1 A(E))x 
2"+k \Ees N 



* W\ ™ ax ^i ' 

The last inequality simply states that the minimum is not more than 
the average. Setting A N = Y,e<=s n A(E),we have 

2" + * x T A N x 2» +k 

max — ~ — = , r.v. M\A N ), 



\S N \, x T s 

where \\{A N ) denotes the largest eigenvalue of A N . By Property I, the 
ith diagonal entry of A N counts the total number of times the edge *, 
appears in some component of the error events of length N. By Lemma 
3 all diagonal entries are equal to 2N\S N \/2 v+k . Property II implies 
that all row sums of A N are zero. By the Gersgorin Circle Theorem 6 

(2N I S N I 
Xi(Ajv) < 2(diagonal entry) = 2 I +k 



and so 



° in ^4iV = 4(l+^). I i 
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Remarks: In Section IV we derive a formula for Q{S N ), and, by 
computing Xi(A N ), we prove 



> -l\ 1 + l'- 



In Appendix B we prove that if v — (N — l)k + Z, where < / < k, 
then 



=£4 1 + 



where LyJ denotes the integer part of y. 

IV. A FORMULA FOR Q(S N ) AND A SHARPER BOUND 

In this section we derive a formula for Q(S N ), the matrix obtained 
by averaging the distance matrices of all error events of minimal 
length N = (k + u)/k. We require a matrix representation of the group 

If A is an m X n matrix and B is an rai X n\ matrix, then the tensor 
product A <8> B (also called the Kronecker product) is the rami X nrii 
matrix 



A®B = 



a n B a l2 B a ln B 

a-LxB ayiB a 2n B 



_ a ml B a m2 B a mn B 



Tensor products are discussed in Ref. 5, where they are called direct 
products. For appropriately sized matrices, A, B, C, and D, we have 
(A®B)(C® D) = (AC) <8> (BD). If X is an eigenvalue of A with 
associated eigenvector u, and n is an eigenvalue of B with associated 
eigenvector w, then X/x is an eigenvalue of A ® B with eigenvector 
V <8> w. 

We denote the n X n identity matrix by /„ and we abbreviate I 2 to 
/. Set 



A = 



ll 

i oj- 



(24) 
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Consider the 2" + * X 2" + * matrix 



I <S> A® J 

— » <— 



j terms 
= I 2 i ® A® Iv, 
where i + j + 1 = v + k. This is the matrix 
h< 

h o 



i terms 



(25) 



Pi = 







h 

y o 



J 2 - 
hi o 

with the indicated block repeated 2 J times along the main diagonal. 
Define u„ i - 0, 1, • • • , 2** - 1, to be the binary * + A tuple with a 1 
in position i and 0's elsewhere. Let 

x=(x , • • • , xr^-if = (*(0 • • • 0), ...,x(l •■• 1)) T . 

The permutation g u . maps x(s) to x(u, + s) and so it interchanges edges 
with subscripts differing by 2'. But this is precisely the effect of the 
transformation x — > Ptf. If t is an arbitrary v + k tuple then the matrix 
describing the permutation g t is obtained by multiplying the appropri- 
ate matrices P,. For t = {t , t lt • • • , tn-*-i) we define 

M(t) = M^k-i <8> • • • ® Mi ® Mo, (26) 

where 



M=V if *"? 
M ' \A if fe-1. 



(27) 



Note that the subscript order in (26) is the reverse of the subscript 
order in the vector t. We have now proved the following lemma. 
Lemma 4: Iftisav + k tuple, then the permutation g t : x(s) -» x(s + t) 
is represented by x — » M(t)x. 

As an example, the permutation goio given in (12) is represented by 
the matrix P = J ® A ® J given in (14). By Lemma 4 we may regard 
Gk,, as the following group of matrices: 

G M = {M^-i <8> • • • 8 M 1 ® M | Mj = I or A, 

jmQ,...,w + h-l). (28) 

We shall prove that Q(S N ) is a particular linear combination of 
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matrices M{t) in G*,„. To calculate XAQ(S N )) we need to work with 
eigenvectors and eigenvalues of the matrices M(t). 

The matrices M(t) are symmetric and they all commute; hence, they 
can be simultaneously diagonalized. Let H be the tensor product of 
v + k copies of 



V2L1 -I} 



Observe that H' 1 = H T . Since (1, 1) T and (1, -1) T are eigenvectors of 
A, the columns of H are eigenvectors of M(t) for all v + k tuples t. 
Thus, H T M(t)H is diagonal for every matrix M(t) in G A ,„. If p = 
(Po, • • • ,p v +k-i) is a binary v + k tuple, define 

w(p) = w v+k -i $ • • • ® Wi ® w , 

where 

m f(l, l) r if p 7 = 
"* 1(1, -D r , if Pi = 1. (29) 

The vectors w(p) are the columns of H. Note that w(p) is formed by 
reversing the vector p. We have A(l, 1) T = (1, 1) T and A(l, -l) r = 
-(1, -l) r . If t = (t , • • • , tr+k-x) then by (27) and (29) 

"+A-1 /v+k-1 \ 

M(t)w(p)= <8> M ; u;y= II (-l)^ p > )w(p) 

= (-lyMp), (30) 

where p • t is the dot product of the vectors p and t. 
Lemma 5: Suppose R is a diagonable matrix that commutes with every 
matrix M{t) in G k , v . Then R is a linear combination of the matrices 
M(t) in G*,„. 

Proof: If s, t are different v + k tuples, then by Lemma 1, g a (xo) ¥= 
gt(x ). The permutation matrices M(t) are therefore linearly independ- 
ent because the l's in row are in different positions. Thus we have 
2" + * linearly independent diagonal matrices H~ l M(t)H. Since R com- 
mutes with every matrix M{t), H~ l RH commutes with every matrix 
// _1 M(t)H, and therefore H~ x RHis diagonal. The matrices H~ l M(t)H 
span the set of diagonal matrices so H~ l RH is a linear combination of 
matrices H~ l M(t)H and the lemma follows. □ 

Lemma 6: If S is a set of error events fixed by G£, then (Ze<=s A(E)) is 
a linear combination of the matrices M(t) in G*,„. 

Proof: The distance matrix A(E) of an error event is the sum of 
contributions from each component: 

A(E) = £ A C (E), (31) 

c 
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where we sum over the components c of E. The restriction of G&, to 
the edges in any component c is just the group G*,„. If edges x(s), x{s) 
appear in component c of error event E, then edges g t {x(s)), g t (x(s)) 
appear in component c of error event E' = g t (E). We have 

A C (E') = M(t) T A c (E)M(t). (32) 

Since M(t) is a permutation matrix and M(t) 2 = /, we have M(t) T = 
M(ty\ Now g t merely permutes the error events in S, so that by (32), 



£ ME) = I Mgt(E)) = I M(tr l A c (E)M(t) 

Ees E(ES Ees 



= M(t)- 1 I AAE) M(t) 



E&3 



(33) 



for all matrices M(t) and for all components c. Summing (33) over all 

components c finishes the proof. D 

Example: If S is the orbit of error events shown in Fig. 6 then 



3 -1 -1 -1 

-1 3 -1 -1 

^1 3 -1 -1 

-1 -1 3 -j_ 

^[ 3-1-1 

-1 -1 3 -1 

^1 -1 3-1 
-1 -1 -1 3 



I A(E) = 

EeS 



= 3/<8>/®/-(/®/®A + /<8>A<8>J + A<8>/®/). (34) 

Consider S N , the set of all error events E(a, z\ b u bt) of minimal 
length N=(k + v)/k. Recall that a = (a lf • • • , a N -i) is the initial state, 
z = ( Zl , ... , z N -i) is the final state, and &i, 6*i are the first pair of 

inputs. We have \S N \ - i^) T-T. 

Lemma 7: 

(1) Lett=(t Q , ■■■ , t N -i) and let t' = (t u • • • , t N _i) where U, i = 0, 1, 
. . . , N - 1, is a binary k tuple. If gt = {gt*, gt, • • • , gt"- 1 ) G G£„ then 

gt(E(a, z\ b u bf)) = E(a + t',z + t', &i + to, bf + to). (35) 

(2) The group Gt,, partitions the set S N of error events of length N into 
2"(2* - 1) orbits each of size 2 v+k ~ 1 . 

Proof: Part (1) follows from the definition of go given in (17). To 
verify part (2) we note that E{a, z; fa, 6? ) is fixed only by the symmetry 
gt, where b - (fa + &f, 0, 0, • • • , 0). Hence, each orbit consists of 
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2* + * ! distinct error events. Since the total number of error events in 
S N is (2'.2'(2*)(2* - l))/2, we see that there are 2"(2* - 1) orbits. □ 
The orbit containing the error event E(a, z; b\, b*) is determined by 
a + z and b x + b*. Setting / = b x + bf, we denote this orbit by 
S(a + z; /). This orbit contains E(a, z; 0, /); note that / ¥= because 
6i ¥= b*. Recall that if / is a k tuple, then the v + k tuple (/0 • • • 0)' 
equals (y ,y u • • • , y N -i) where y, = / and y, = for j * i. 
Lemma 8: Let S N be the set of all error events of length N and let 
S(a + z; i) be the orbit of Gj?,„ containing the error event E(a, z; 0, /). 
Then 



N-l 

v+kl 



(1) 2> +k Q(S(a + z; /)) = 2NI^ - 2 2 M((/0 . . . 0)') (36) 

i=0 

(2) 2" +fe (2* - l)Q(S N ) = 2(2* - l)NIr» 

-2 2 2 M((/0..-0)'). (37) 
/#o «=o 

Proof: We calculate the contribution to Q(S(a + z\ /)) made by pairs 
of edges in component 0. Since the restriction of G£„ to the edges in 
any component is just the group G k , v , this distance contribution is 

-^Tk 2 [gMOai ■ ■ ■ a N -i)) - g t {x(fa x • ■ • a N - x ))f 
& t 

= — 2 [X(t + (0Ol • ■ • ON-J) - X(t + (M • • • a N -i))] 2 



1 

2* + * 

1 



2 2 *(*)*- 2 2 *(*)*(* + (/0 ■•• 0)) 



)»>+* 



x T [2/ 2 -*- 2M(/0 ... 0)]x. 



In general, the distance contribution made by edges in component i is 
-^r k 2 [gt(x(ZN-i • • • ZN-iOai ■ ■ ■ a N -i- x )) 

- g t (x(z N -i ■ ■ • z N -ifai • • • a N -i-i))] 2 

= T^Tk £ [x(t + (zN-i ■ ■ • 2iv-i0ai • • ■ a N -i-i)) 

& t 

- x(t + (z N -i ■ ■ ■ z N -xfai • • • a N -i-i))] 2 

1 

2" 



= — 5 x_ T [2I 2 ~> - 2M((/0 • • • 0Y)]x. (38) 
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Summing (38) over all components i, we obtain (36). Since (36) is 

independent of a + z, we obtain the formula for Q(S N ) by summing 

(36) over all nonzero k tuples /. D 

Remark: When k = 1, there is only one choice for /, namely / = 1, and 

so every form Q(S(a + z; /)) is equal to Q(S N ). For k = 1, v - 2, we 

have 

Q(S(00; 1)) = Q(S(10; 1)) = Q(S(01; 1)) = Q(S(11; 1)) = Q(S 3 ) 
= l/4[3/ 8 -(I®I®A + I®A®I + A®I®I)] 

[see the matrix given as (34)]. However, for ft > 1, the form 
Q(S(a + z; /)) will change with /. Thus, for k = 2, v = 4, we have 

Q(S(a + z; 11)) = l/32[3/64 - (h ® / 4 ® (A ® A) 

+ / 4 <8> (A ® A) <8> / 4 + (A ® A) ® / 4 <8> / 4 )], 

while 

Q(S{a + z; 10)) = 1/3213/e, - (/ 4 ® 7 4 ® (/ <S> A) 

+ 7 4 ® (7 A) ® J 4 + (7 ® A) <8» 7 4 ® J 4 )]. 

Theorem 2: If k divides v, then the normalized minimum distance of 
any sliding window trellis code with 2" states and rate k bits/channel 
symbol satisfies 

dL n 2 k+1 { 



P -2>-l\ 1 + k l 
Proof: From the proof of Theorem 1, we have 

^ as 2" + %[Q(S N )] 



\i(Qn), (39) 



2* - 1 



where Q N = (2* - l)2' ,+k Q(S N ). Let c = (c , • • • , c N - x ) be a binary v + k 
tuple and let y be the number of nonzero k tuples c,. Then by (30), the 
eigenvalue of Q N associated with w(c) is 

2(2* -l)N-2 I I (-D Cr/ 

/#0 j=0 

JV-1 

= 2(2* - 1)N - 2 2 1 (-D cr/ . (40) 

«'=0 f+0 
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i <-^ ■ ft: " 11 



where we sum over all nonzero k tuples /. Since 

= 

eq. (40) becomes 

2(2* - l)N - 2[(2* - 1)(JV - 7) - 7] = 2* +1 7- (41) 

The largest eigenvalue of Qn is obtained when 7 = N = (1 + (v/k)). 
The theorem now follows from (39). □ 

Remarks: Observe that the largest eigenvalue of Q(S N ) is associated 
with w(c), where c , c a , • • • , cjv_i are all nonzero. For example, with 
k=l, the largest eigenvalue, 4(1 + (v/k)) has multiplicity one and is 
associated with the eigenvector (1, — 1) T ® (1, — 1) T ® • • • ® (1, —1) T . 
When k > 1, there will be several linearly independent eigenvectors 
associated with Xi(Q(S N )) because there are several choices for c with 
all c, ^ 0. Also, note that Theorem 2 gives the same bound as Theorem 
1 when k = 1. For k > 2, the bound of Theorem 2 is an improvement. 
In Appendix B we prove that if v = (N — l)k + I where < / < k, 
then 



j2 oA-H-1 



v min 



P 2*"' - 1 
where L y J denotes the integer part of y. 



1 + 



V. A FINAL BOUND OBTAINED FROM A WEIGHTED AVERAGE 

Let S N+1 be the set of all error events of length N + 1 = 2 + (v/k). 
Let Q(S N+1 ) be the matrix obtained by averaging the distance matrices 
of all error events of length N + 1. In this section we derive a formula 
for Q(S N+1 ) and we prove 

P * 2*" - 1 \ Z + " 

using a weighted average of Q(S N ) and Q(S N+1 ). 

An error event E of length N + 1 is determined by the initial state 
a = (a u • • • , a N -i), the final state z = (z x , • • • , z N -i), the inputs b u b* 
at time 0, and the inputs 62, b* at time 1. Since the two paths diverge 
at time 0, we must have bi ¥> 6f . To remerge at z the last N — 1 inputs 
must be the k tuples z N - u z N - 2 , • • • , Zi in that order. After N inputs 
the two paths occupy states 2 2 • • • z N -i b 2 and z 2 • ■ • z N -ib*. At this 
stage the two paths must be disjoint so 6 2 # b*. We denote this error 
event E by E(a, z; b u b*; b 2 , 0*) [equivalently E(a, z; 6f, 61; b$ , o 2 )]. 

The group G£„ maps error events of length N + 1 to error events of 
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length N + 1. To be specific, let t = (t , • • • , t N -i) be a v + k tuple and 
set £' = (t lt • • • , fo-i), £" = (t , • • • , tw-2). If gf = (Mt, gt\ • • • , &*-», 
#) then it follows from the definition of ft* given in (7) that 

gf(E(a, z; 61, frf; fe, 62*)) 

= E(a + £', z + t"; fe x + t , bt + t ; b 2 + fe-i, b* + for-i). (42) 

This group action does not preserve a + 2 but it does preserve fci + 6* 
and 6 2 .+ fc* • Set g = b x + 6? and / = b 2 + b* • We denote the orbit of 
G£, containing the error event E(a',z'\ 0, g; 0, f) by S(a', z'\ g, f) (in 
the discussion above, a' — (d, • • • , a^_ 2 , ajv-i + ^2) and z' = (zi + 61, 
z 2 , • • • , z N -i)). Note that f,g ¥> 0. 

If /, g are fc tuples, then the p + k tuple (/fcO • • • 0)° - (fgO • • • 0) 
and (fgO • • • 0)' is obtained from {fgO • • • 0)' _1 by cycling the blocks 
of k bits to the right and moving the last block to the front. Thus 
(f g ... 0)"" 2 = (0 • • • Ofg). Define matrices M,(/g0 • • • 0), i = 
0, ■ • • , N in Gfe,, as follows: 

M o (fg0 ••• 0) =M(g0 ... 0) 

MdfgO • • • 0) = M((/#0 . . • or 1 ) i = 1, • • • , N - 1, 

MA,(/g0 • ■ • 0) = M(0 • • • 0/). (43) 

Example: For k = 1, * = 2, the orbit S(00, 00; 1, 1) is shown in Fig. 7. 
The quadratic form Q(S(00, 00; 1, 1)) is given by 



Q(S(00, 00; 1, 1)) = 



8 



8 


-2 




-2 


-2 




-2 




-2 


8 


-2 






-2 




-2 


-- 


-2 


8 


-2 


-2 




-2 




-2 




-2 


8 




-2 




-2 


-2 




-2 




8 


-2 




-2 




-2 




-2 


-2 


8 


-2 




-2 




-2 






-2 


8 


-2 




-2 




-2 


-2 




-2 


8 



= i(8J 8 -2(J®/<8>A + J®A®A + A<8>A®7 

o 

+ A <8> / <8> /)) 



= ; 8/ 8 -25i m.(iio)). 

8 \ ,=o 



(44) 



Lemma 9: Let S N+1 be the set of all error events of length N + 1 and let 
S{a, 2; g,f)be the orbit of G£,„ containing the error event E(a, z; 0, g; 0, 
/). Then, 
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£=£(00, 00; 0,1; 0,11 




^1OO-901O-ffoO1^10O l<f> 

o a O 




IfoiO'ffooi'fioo'flsio' <£) 
O O a O O 

O /O 



o yo o 


p 


O D— O— 


-a o 


<0OO1 '0100-0010 


.fooi ><£ 


o a o 




°/J\GJ 


'o /o 


a o \f 


D O 


o o o 


O O 



l »iiO'*bii»*ioi''iio" f ' 

o o o o o 




<0ioi'0iiO'0oii'0ioi ,(fl 

o o— o— a. o 



'001 1-0101 -01 10 -001 i ),f) 
O O O O O 

Fig. 7— The orbit S(00, 00; 1, 1). 




(1) 2> +k Q(S(a, z; g, /)) = 2<JV + l)/ 2 ^ - 2 £ WftO ■••<>). (45) 

i=0 

(2 ) 2 * + *(2* - 1) 2 Q(S" +1 ) = 2(2* - l) 2 (N + l)/ 2 ~* 

-2 £ 2 M,-(/fr0 •••0). (46) 

/,£*0 i=0 

Proof: We calculate the contribution to Q(S(a, z; g, /)) made by pairs 
of edges in component 0. This distance contribution is 

=T k I [x(t + (Oai • • • a N _,)) - x(t + (g Ql ■ ■ • cin-i))] 2 



— k x T [2I r +> - 2M(gO 



0)]x_ 



as found in the proof of Lemma 8. Similarly, the contribution made 
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by pairs of edges in component N (the last component of the error 
events) is 

i 1 [X(t + (2i • • • Z N -iO)) ~ X(t + fa • • • ZN-lf))? 

2 t 

= ^ * T I2/2-* - 2M(0 • • ■ 0/)b. 

For i = I, ■ ■ ■ , N — I, the contribution made by pairs of edges in 
component i is 

■z^k £ [*(* + (zN-i + i ■ • • Zjv-iOOct! • • • ajv-f-i)) 

- x(t + (z N - i+ i • • • ZN-ifgai • - • aN-i-i))? 

= p* [2 S *(') 2 " 2 2 *(*)*<* + (/gO ■ • • 0) 1 " 1 )] 

= ^ x r [2/2-* - 2M((/gO ■ • • 0)'- 1 )]*. 
The sum of the contributions from all AT + 1 components is 



Q(S(a, z; g, /)) = 



liH-fc - 



2(N+ !)/*♦* -2 t Mi(fgO ... 0) 



1=0 



This proves part (1). Observe that (45) is independent of a and 2. We 
obtain Q(S N+1 ) by summing (45) over all pairs g, /of nonzero k tuples. 
Since there are (2* - l) 2 such pairs, 



(2* - l) 2 2 v+k Q(S N+1 ) 



N 



= 2(2* - 1) 2 (N + 1)J 2 ~* - 2 £ £ M,(/gO ... 0) 

as required. □ 

Remarks: When A = 1, we must have / = g = 1 and so every form 

Q(S{a, z; g, /)) is equal to Q(S N+1 ). In this case, N = 1 + v and 



Q(S 



iV+l\ _ 



)H-1 



2(2 + iOJyi -22 M,(110 



i-0 



....,] 



[see the matrix given as (44)]. For fe > 1, there are several choices for 
/ and g. Thus, for k = 2, v = 4, we have, with g = (1, 1) and / = (0, 1), 

Q(S(a, z; 11, 01)) = l/64[8/ M - 2(/ 4 <8> U ® (A ® A) 

+ U ® (A ® A) ® (A <8> /) 
+ (A <8> A) <8> (A <8> /) <8> h 
+ (A <8> /) <8> h ® A)], 
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while with g = (01) and / = (10) we get 

Q(S(a, z; 01; 10)) = l/64[8/e4 - 2(/ 4 ® / 4 ® (A ® /) 

+ I 4 ® (A ® I) ® (/ ® A) 

+ (A <8» /) ® (/ ® A) ® 7 4 

+ (/ ® A) ® J 4 ® /*)]. 

Theorem 3: If k divides v then the normalized minimum distance of any 
conuolutionally derived trellis code with 2" states and rate k bits/channel 
symbol satisfies 



Proo/: If Q is any weighted average of Q(S N ) and Q(S N+1 ), then by (8) 
we have 

^ *s 2> + %(Q). (47) 

Let 6 = l/(2 2 * - 1). Then 2(2* - 1)5 + (2* - 1) 2 6 = 1. Define Q to be 
the following weighted average of Q(S N ) and Q(S N+1 ): 

Q = 2(2* - 1)6Q(S N ) + (2* - 1) 2 8Q(S N+1 ). 

Set 

Q N = 2" + *(2* - 1)Q(S N ) 



and 



Then by (47) 



Q N+1 = 2" + *(2* - l) 2 Q(S N+l ). 



< 



5\i(2Q N + Q N+l ). (48) 



The eigenvectors, w(c), of Q N and Q N+i are in 1-1 correspondence with 
binary vectors c = (ci, • • • , c N ), where c„ i = 1, • • • , N are fe tuples. 
By (41) 

Q N w(c) = 2 k+1 y(c)w(c), (49) 

where 7(c) is the number of nonzero k tuples c,. Introduce k tuples 
Co = c N +\ = and define 

a(c) = I \i\ c, = 0, Cj+i ^ or c, *= 0, c,-+i = 0} | 
and 
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/3(c) = | {i I Ci * and c w + 0) |. (50) 

There are (AT + 1) - a(c) - 0(c) indices i, ** i *Z N, for which c, = 
c,+i = 0. By (30) the eigenvalue of Qjv+i associated with w(c) is 

2(2* - 1) 2 (N + 1) - 2 J I (-ly*-**** 

/,£*0 i=0 

= 2(2* - 1) 2 (AT + 1) - 2 I ( I (-l) Cif ) f I (-l)^M. (51) 

«=o \/#o / \«#o / 

Recall that the sum £/*„ (-l) c '> is (2* - 1) when c, = 0, but equal to 
-1 whenever c, * 0. Hence (50) is equal to 

2(2* - 1) 2 (N + 1) - 2[((N + 1) - a(c) - 0(c))(2* - l) a 

- a(c)(2* - 1) + 0(c)] 

= 2(2* - l) 2 (a(c) + 0(c)) + 2(2* - l)a(c) - 20(c) 

= 2* +1 (2*(a(c) + /3(c)) - 2(a(c) + /3(c)) + «(c)) 

= 2* +1 [2*(a(c) + /3(c)) - (a(c) + 20(c))]. (52) 

Now, y(c) is the number of nonzero c,'s. Since each nonzero c, 
appears in two pairs, (c,_i, a) and (q, c,+i), we have 

a(c) + 20(c) = 2 T (c). (53) 

Substitution in (52) shows that the eigenvalue in (51), of Q N+1 asso- 
ciated with w(c) is 

2* +1 [2*(2 7 (c) - 0(c)) - 27(c)]. (54) 

By (49) and (54) we have 
(2Q N + Q N+ i)w(c) 

= 2* +1 (2t(c) + 2*(2 7 (c) - /3(c)) - 2 7 (c)Mc) 

= 2 2 * +1 (2 7 (c) - 0(c))u>(c). (55) 

There are N - y(c) indices i, 1 ^ i «£ N, for which c, = 0. Since every 
c h l^j^N, appears in the two pairs (c,_i, c,) and (c,, C/fi), there are 
at most 2 + 2(iV - 7(c)) indices i, O^i^N, for which c, = or cjm = 
0. Hence 

0(c) > (N + 1) - 2 - 2(N - 7(c)) = 27(c) - N-l 

and 

2(7(c)) - 0(c) « N + 1. (56) 
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Now (48), (55), and (56) imply 



J 2 q2*+1 o2fe+l 

u min ^ ,,,_> ^ /. f 



Remarks: Equality can hold in (56). If N is odd, set c = c 2 = • • • = 
Cn + i = and c u c 3 , • • • , c N # 0. Then 7(c) = (N + l)/2 and /3(c) = 0. 
(Observe that for k = 1, v = 2, the largest eigenvalue of the form 
2Qn + Qn+i is associated with eigenvector (1, — 1) T ® (1, l) r ® 
(1, -1) T .) If N is even, set c = c 3 = c 5 = c 7 = • • • = c N +i = and 
C\, C2,C4,c G , • • • , c N i= to get 7(c) = (N + 2)/2 and /3(c) = 1. Setting 

2 k+1 / r\ 2 2A+1 / ,/ 

yields v = k(2 k — 1). If v < k(2 k — 1), then Theorem 2 gives the stronger 
bound; if v > k(2 k — 1) then Theorem 3 gives the stronger bound. In 
particular, for k = 1, Theorem 3 gives a stronger bound for any v>l. 

The bound given by Theorem 3 is obtained from the largest eigen- 
value of a particular weighted average of Q(S N ) and Q(S N+1 ). In 
Appendix A we use the duality theorem of linear programming to 
prove that no other weighted average of Q(S N ) and Q(S N+1 ) gives a 
stronger bound. 

In Appendix B we prove that if v = (N — l)k + I, where «K I < k, 
then 



9 2(*-/)+l 

■2 + 



p - 2 2 <*-'> - 1 
where ly] denotes the integer part of v. 

VI. CONCLUSIONS 

Three upper bounds on the normalized minimum distance, (dmin/ 
P), have been given for trellis codes. The bound 



< 4 1 +T 



given in Theorem 1 is typical. This certainly provides nontrivial 
information. For example, is it possible to gain 10 dB in minimum 
distance using 2 6 = 64 states at rate 1 bit/symbol? The answer is no. 
Theorem 1 bounds the gain at 8.4 dB; Theorem 3 bounds the gain at 
7.3 dB. Nevertheless, there still remain the questions of how tight 
these bounds are and if they exhibit the "right" dependence on the 
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Table 1 — Possible gains at rate 


1 bit/symbol 






Upper 


Bounds 


Lower 






Bound 


Theorems 1 




v (Ungerboeck) 


and 2 


Theorem 3 


2 2.5 db 


4.7 db 


4.3 db 


3 3 


6.0 


5.2 


4 3.4 


7.0 


6.0 


5 4.2 


7.8 


6.7 


6 4.5 


8.4 


7.3 


7 5.1 


9.0 


7.8 


8 5.3 


9.5 


8.2 


9 5.6 


10.0 


8.6 


10 5.8 


10.4 


9.0 


11 6 


10.7 


9.4 



parameters v and k. For example, consider the normalized minimum 
distance for block codes of length n, having 2 nk code words (k bits/ 
symbol). In that case, known upper bounds behave, for large n, like 
d 2 /P £ 2n/4\ Thus the linear dependence on v, a quantity analogous 
to block length, appears correct. However, the true dependence on k 
may be different from our bound. Table I gives upper and lower bounds 
on the gain (in dB) that is possible at rate 1 bit/channel symbol. The 
lower bounds arise from codes constructed by Ungerboeck. 4 

Also minimum distance is by no means the complete story with 
regard to error rate. The heuristics leading to the claim that terms 
involving d,^ would dominate an upper bound on the error rate make 
the assumption that the infinite series determining the upper bound 
converges. Even if a code with a good d min were found, an upper bound 
on error rate should still be computed for that particular code. As an 
example of a catastrophe that may occur, consider the assignment of 
edge labels x T = (1, -1, -1, 1, -1, 1, 1, -1) to the trellis of Fig. 1. One 
observes that a pair of edges leaving a node always contributes (1 — 
(_1)) 2 = 4 to the distance and similarly for a pair of edges merging 
into a node. One immediately concludes that no error event has 
distance less than 8 for this edge assignment. Since P = 1, this is a 3 
dB gain over the uncoded ±1 situation. How could this happen with 
only ±1 symbols? One answer is that we forgot to include unmerged 
events, events which go on forever. We had implicitly assigned infinity 
to their distance, but now some have distance 4. However, this could 
be rectified by perturbing the ±1 edge labels by small amounts. A 
more serious trouble with this code is that an infinite number of error 
events have (essentially) the minimum distance and so a coefficient 
that we did not explicitly consider turns out to be infinite for this 
particular code. 
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APPENDIX A 

The upper bound of Theorem 3 is obtained from the largest eigen- 
value of a particular weighted average of the quadratic forms Q(S N ) 
and Q(S N+1 ). In this appendix we prove that no other weighted average 
gives a stronger bound. We shall assume throughout that v ^ k(2 k — 
1) since the bound given in Theorem 3 improves upon that given in 
Theorem 2 only for v in this range. 

If r lf r 2 ^ and r x + r 2 = 1, then 

^f *S 2' + %(r 1 Q(S N ) + r 2 Q(S N+1 )). 

Recall from (29) that the eigenvectors w(c) of Q(S N ) and Q(S N+1 ) are 
1:1 correspondence with binary v + k tuples c. Let c = (c u • • • , Cn), 
where c,, i = 1, • • • , N is a binary k tuple and let c = cn+i = 0. Recall 
that 

a(c) = | \i\d = 0, c, +1 * or c, * 0, c i+1 = Oj |, 

0(c) = \\i\a . * 0andc l+1 * 0}, 

and 

7(c) = Mile* 0j|. 

Define <f> N (c) and <t> N+ i(c) by 2" + *(2* - l)Q(S N )w(c) = 4> N (c)w(c) and 
2" + *(2* - l) 2 Q(S N+1 )w(c) = <t> N+1 (c)w(c). Then by (49) and (54) 

0n(c) = 2 k+1 y(c) (57) 

and 

<f> N+ dc) = 2* +1 [2*(2 7 (c) - 0(c)) - 2 7 (c)]. (58) 

To find the optimal weighted average we have to solve the following 
linear programming problem. 

Choose real variables r u r 2 , r 5* so as to minimize r subject to the 
inequalities 

-(ri + r 2 ) =S -1 (59) 
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and 



0at(c) Wc) forall „ + fctupleBC . 

2* - 1 (2* - l) 2 



In Theorem 3 we proved that a feasible solution to (43) is 



Tl = 



2*+l' 



r 2 = 



2*- 1 
2* + 1' 



r = 






(60) 



The linear program (59) is the dual of the primal linear program given 

below. 

Choose real variables a c , a S* 0, where the index c runs through all 
binary v + k tuples, so as to maximize a subject to the inequalities 



2*- 1 

1 



Efor(c)Oc)-a>0 



( X 0jv + i(c)Ocj - a 5* 



(61) 



(2* - l) 2 



If we can find a feasible solution to (61) with 

2 2fc+1 /„ v\ 



then by the duality theorem of linear programming, 2 (60) is an optimal 
solution to (59). We consider two cases. 

Case 1. N odd 

Pick / = ( /,, • • • , f N ), where f it i = 1, • • • , N is a binary fc tuple and 
every /, is nonzero. Pick g = (gi, • • • , ftv), where ft, i - 1, • • • , N is a 
binary k tuple and ft * if and only if i is odd. Then y(f) « N, 0(f) 
= N - 1 and 7 (g) - (N + l)/2, /3(g) = 0. By (57) and (58), <f> N (f), 
<i>N{g), <t>N+Af), and <t) N+ i(g) are as follows: 





f 


g 


<t>N 


2 k+1 N 


2 h (N + 1) 


<}>N+\ 


2 k+1 [2 k {N + 1) - 


-22V] 


2* +, [(2*-l)(iV+l)] 
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Set 



2* - 1\ IN + 1 



2* + 1/ \N - 1/' 



Or = "S 



l- fl/ = 



2(N - 2 k ) 
(2* + l)(iV- 1)' 



a = 



.0, 
92*+ i 



if c = f 

if c = g 
otherwise, 



(62) 



Direct calculation shows that (62) is a feasible solution to (61). (Since 
v > k{2 k — 1), the variables a c are all nonnegative.) 

Case 2. N even 

Pick h = (hi, • • • , h N ), where hi, i = 1, • • • , N, is a binary k tuple, 
hv = h 5 = h.T = Ii9= •■• = hfj-i = 0, and fci, h 3 , h 4 , he, h&, • • • , hN are 
nonzero. Then y(h) = (N + 2)/2, j8(/i) = 1 and, by (57) and (58), 

d> N (h) = 2 k (N + 2) and <f> N+1 (h) = 2* +1 [(2 fe - 1)(N + 2) - 2*]. 

Set 

f (2* - 1)N - 2 



a, = - 



(2 k + 1)(N - 2)' 



a = 



l- a/ = 

o2A+l / 



2(N - 2 k ) 
(2 k + 1)(N- 2)' 



if c = / 

if c = g 
otherwise, 



(63) 



Direct calculation shows that (63) is a feasible solution to (61). (Again 
since v ^ k(2 k — 1), the variables a c are all nonnegative.) 

We have now shown that (60) is an optimal solution to (59). 



APPENDIX B 

In this appendix we extend Theorems 1, 2, and 3 to the case when 
k does not divide v. Setting v = (N — l)k + I, where ^ / < k, we have 
N = L(f + k)/k\ where LyJ denotes the integer part of y. 

Encoder states are labelled with binary v tuples in the way described 
in Section II. Edges of the trellis are labelled with real numbers x(s), 
where s is a binary v + k tuple. The group G*,„ is defined in the way 
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described in Section II; for each binary v + k tuple t, we define a 
permutation of the edge labels x(s) by the rule 

g t (x(s)) = x(s + t). 

The symmetry #* is the sequence 

g* = (#t°, #t*, #A •••), 

where t° = t and t L is obtained from t l ~ l by cycling the entries k bits 
to the right and moving the last k bits to the front. When k = 2 and 
p = S, 

#11001 = (#11001, #01110, #10011, #11100, #00111, #11001, • • • )• 

In general, V = t d+i where d = (k + u)/gcd(k, k + v). Given any 
component of the trellis and any pair of edges x(s), x(t) in that 
component there is a unique element of G£„ interchanging x{s) and 
x(t). The proof of Theorem 1 goes through without change and we 
have 

^ *£ 4N , (64) 



where iV is the minimal length of an error event. 

To see that N = N, consider an error event E with initial state 
(time t — 0) a « (d • • • on-iOn), where a u • • • , a N - u are k tuples and 
a N is an / tuple. If k tuples 6i, b* are input at time then at time 1 the 
two paths occupy states (6i, d, • • • , a N - 2 , a N -i) and (6*, Oi, • • • , a N -u 
a N -i), where s denotes the / tuple (si • ■ • si) obtained from the k tuple 
(si • • • s k ) by deleting the last k - I bits. At time 1 the fe-tuple znc is 
input to both paths, where c is a fixed but arbitrary k - I tuple. At 
time 2 the two paths occupy states (znc, b u a x , • • • , as-z, a^-2) and 
{znc, bf, oi, • • • , a^-3, a N - 2 ). At time iV, after inputs z N - u • • • , z 2 , the 
two paths occupy states (z 2 , 23, • • • , 2jv-i, znc, fci) and (z 2 , 23, • • • , zn-u 
znc, 61). If 61 = 61 then the two paths remerge at time N in state 
z = (22, 23, • • • , z N -i, znc, 61). We denote this error event by E(a, 2; b u 
6f). Thus by (64) 



<4 1 + 



(65) 



for general k and v. 

Let S(a, 2; 61, 6*) be the orbit of G&„ containing the error event 
E(a, 2; 61, ft?). We calculate the contribution to Q(S(a, 2; 6i, #)) made 
by pairs of edges in component in the same way as Lemma 8. Setting 
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f=bi + b* this distance contribution is 

T^Tk 2 \gt(x(biai ■ ■ • a N -!a N )) - g t (x(bUi • • • a N -ia N ))] 2 
* t 



1 

2 



+fc x T [2/ 2 -* - 2M(/0 • • • 0)]x. 



Similarly, the distance contribution made by pairs of edges in com- 
ponent N — 1 is 



_1 

2 



ha *T2ir- - 2M((/0 • • • 0)"" 1 )]*. 



Note that the first / bits of / are zero and that the last / bits of 
(/0 • • • 0)' are zero for «S i < N — 1. Arguing as in Lemma 8 we 
obtain 

2" + *Q(S(a, 3 b lf flf)) = 2M&» " 2 S M((/0 . . . 0)'). 

There are (2*"' - /) k tuples / for which / * and /= 0. Hence 

Af-l 

2 H "*(2*-' - l)Q(S N ) = 2(2*"' - l)NIr* ~ 2 £ £ M((/0 • • . 0)'). 

/#0 «'=0 

/-o 
Setting Q = 2" + *(2*-' - l)Q(S N ) we obtain 

C&. l 



,, ^^MQ), (66) 

which reduces to (39) when / = 0. The proof of Theorem 2 goes through 
(change "c, = 0(^0) "to "the last k - I digits of c, are zero (nonzero)") 

Xi(Q) = 2 k ~ l+1 N. (67) 

By (66) and (67) 



j 2 . o*- /+1 



P " 2*-' - 1 
for general k and i». 



1 + 



(68) 



Finally we consider the set S ' of all error events of length N + 1 for 
which the k tuples b%, b* input at time satisfy 6i = b* = and for 
which the k tuples b 2 , b* input at time 1 satisfy b 2 = b* = 0. Let 
E G S' with initial state a = (ci, ■ • • , On-i, a^) and final state 2 = («i, 
• • • , 2at_i, z^), where a,, z,-, i = 1 ■ • • , N — 1 are fc tuples and a//, z^ 
are / tuples. At time 2 the two paths occupy states ((z^O • • • 0) + b 2 , 
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bi, a u ••• , on-s, 0N-2) and {{z N • • • 0) + b*, bf, a u • • , a^- 3 , a N - 2 ). 
At time N the two paths occupy states (z 2 , z 3 , • • • , Zn-u fovO • • ■ 0) 
+ b 2 , 61) and (z 2 , z 3 , • • ■ , zjv-i(zjvO • • ■ 0) + b$, 6?). Set f = b 2 + M 
and g = 61 + 6* and define M t {fg) i = 0, • • • N as in (43). Arguing as 
in Lemma 9 we obtain 



* 



2" + *(2*-')(2*-' - l)Q(S') 



N 



= 2(2*"' - 1)(2*-' - 1)(JV + l)Ir* - 2 2 X 1 Mi(fgO ■ • - 0). 

/i*0 ^7*0 «'=0 



/=o i=o 



Let 5 = l/(2 2( *-° - 1) and let Q= 2(2*"' - 1)8Q(S N ) + (2*"' - 1) 2 SQ(S'). 
Then 2(2*"' - 1)5 + (2* -1 - 1) 2 5 = 1 and so 

^ < 2" + %($). 

The proof of Theorem 3 goes through [change "c, = (=^0)" to "the 
last k - I digits of c, are zero (nonzero)"] and we obtain 

f^^iHl]) <69) 

for general fe and j>. 
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